Exploratory Data Analysis using {gtsummary} package
Motivation
The replication crisis (also called the replicability crisis and the reproducibility crisis) is an ongoing methodological crisis in which the results of many scientific studies are difficult or impossible to reproduce
Reproducible crisis lead to
- Low quality of medical research
- low quality code and contain errors
- Reproducibility is frequently laborious and time-consuming.
Raw data to summary table
SPSS output to summary table
R output to summary table
- Thus, gtsummary package were developed to help the non-coder R users to produce a presentation ready table that are reproducible and customizable.
Image source: Happy R adapted from artwork by @allison_horst; the beach and cocktail images are from pngtree.com
Introduction
Overview of {gtsummary} package
- The {gtsummary} package provides an elegant and flexible way to create publication-ready analytical and summary tables using the R programming language.
- A package developed by Daniel D.Sjoberg et al.
- Use gt package as a background to produced a highly reproducible and presentation ready table.
- Latest version: 1.7.2 (2023-07-15)
- Requirement: R ≥ 3.4
Import Packages
| broom (>= 0.8.0) | broom.helpers (>= 1.9.0) | cli (>= 3.1.1) |
| dplyr (>= 1.0.7) | forcats (>= 0.5.1) | glue (>= 1.6.0) |
| gt (>= 0.7.0) | knitr (>= 1.37) | lifecycle (>= 1.0.1) |
| purrr (>= 0.3.4) | rlang (>= 1.0.3) | stringr (>= 1.4.0) |
| tibble (>= 3.1.6) | tidyr (>= 1.1.4) |
Function
- Creates default tabular summaries with highly customizable capabilities
- Summarize data frames (survival data, survey data)
- Cross-tabulation
- Summarize regression models (linear, logistics and survival)
- Report statistics from gtsummary tables in-line in Rmarkdown
- Stack and /or merge any table type
- Standardize themes across tables
- Choose different print engines
Analysis Workflow
| Function | Customization | Print engines | Theme | ||
|---|---|---|---|---|---|
| gtsummary function | Data arrangement | Additional information | Table cosmetics | coding | coding |
| tbl_summary | by: | add_* | modify_* | as_gt | reset_gtsummary_theme |
| tbl_cross | type: | bold_* | as_flex_table | theme_gtsummary_journal(journal = “lancet”) can choose “lancet”,“jama” and others | |
| tbl_uvregression | statistics: | italicize_* | as_hux_table | ||
| tbl_regression | lable: | as_kable_extra | |||
| tbl_merge | as_kable | ||||
| tbl_stack | as_tibble |
library(haven)
stroke <- read.csv("rconf.csv", stringsAsFactors = TRUE)
stroke$age <- as.numeric(stroke$age)
summary(stroke)## age sex ethnicity married dm hpt
## Min. :24.00 Female:123 Chinese: 34 Divorce: 8 No :170 No : 69
## 1st Qu.:54.00 Male :196 Indian : 25 Married:168 Yes :137 Yes :242
## Median :63.00 Malay :259 Single : 6 NA's: 12 NA's: 8
## Mean :63.12 Others : 1 NA's :137
## 3rd Qu.:73.00
## Max. :95.00
## NA's :8
## ckd af hf.IHD lipid smoke
## :129 : 14 :129 No :221 No :285
## No :189 No :292 No :175 Yes : 85 Yes : 29
## Yes : 1 Yes : 13 Yes : 15 NA's: 13 NA's: 5
##
##
##
##
## WHO dodiag gcs
## ICH : 14 Missing : 26 15 :243
## Intracerebral Hemorrhage (ICH): 21 15/03/2021: 6 10 : 15
## Ischaemic :281 01/03/2021: 4 13 : 12
## SAH : 3 03/01/2021: 4 11 : 11
## 03/03/2021: 4 12 : 10
## 04/01/2021: 4 9 : 9
## (Other) :271 (Other): 19
## nihss mrs
## Minor stroke (1-4) :130 Moderately severe disability:85
## Moderate stroke (5-15) : 86 No significant disability :60
## Moderate to severe stroke (16-20): 20 Slight disability :53
## No stroke symptoms (0) : 45 Severe disability :37
## Severe stroke (21-42) : 17 Moderate disability :35
## NA's : 21 (Other) :23
## NA's :26
## iv_thrombolysis iv_thrombectomy status_dis dodis status_f.u
## No :315 No:319 Alive:306 :129 Alive:228
## Yes: 4 Death: 13 25/01/2018: 4 Died : 91
## 02/03/2021: 3
## 07/10/2020: 3
## 08/01/2018: 3
## 10/01/2018: 3
## (Other) :174
## dodeath Sebab.Kematian
## 22/11/2022:228 :228
## 01/06/2021: 2 SAKIT TUA : 36
## 08/02/2021: 2 STROK : 5
## 08/04/2022: 2 SAKIT STROK : 4
## 12/03/2021: 2 STROKE : 4
## 12/06/2022: 2 DARAH TINGGI: 2
## (Other) : 81 (Other) : 40
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(gtsummary)
stroke$dodiag <- as.Date(stroke$dodiag)
stroke$dodeath <- as.Date(stroke$dodeath)
stroke$dodis <- as.Date(stroke$dodis)
stroke <- stroke %>% mutate(dur = stroke$dodiag %--% stroke$dodeath) %>%
mutate(dur = as.duration(dur))
stroke <- stroke %>% mutate(dur_days = dur/ddays(1))
stroke <- stroke %>% mutate(dur_month = dur/ddays(1)/30.417)
str(stroke)## 'data.frame': 319 obs. of 26 variables:
## $ age : num 52 73 64 58 64 64 66 52 76 45 ...
## $ sex : Factor w/ 2 levels "Female","Male ": 2 2 2 2 2 1 2 2 2 2 ...
## $ ethnicity : Factor w/ 4 levels "Chinese","Indian",..: 3 3 3 3 3 3 3 3 1 3 ...
## $ married : Factor w/ 3 levels "Divorce","Married",..: 2 2 2 2 2 2 1 2 2 2 ...
## $ dm : Factor w/ 2 levels "No","Yes ": 1 2 2 2 1 2 2 1 1 2 ...
## $ hpt : Factor w/ 2 levels "No","Yes ": 2 2 1 2 2 2 1 1 2 1 ...
## $ ckd : Factor w/ 3 levels "","No","Yes ": 2 2 2 2 2 2 2 2 2 2 ...
## $ af : Factor w/ 3 levels "","No","Yes ": 2 2 2 2 2 2 2 2 2 2 ...
## $ hf.IHD : Factor w/ 3 levels "","No","Yes ": 2 2 3 2 2 2 2 2 2 2 ...
## $ lipid : Factor w/ 2 levels "No","Yes": 1 1 1 2 2 1 1 1 1 1 ...
## $ smoke : Factor w/ 2 levels "No","Yes ": 1 1 1 1 1 1 1 1 1 1 ...
## $ WHO : Factor w/ 4 levels "ICH","Intracerebral Hemorrhage (ICH)",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ dodiag : Date, format: "0012-09-20" "0022-09-20" ...
## $ gcs : Factor w/ 13 levels "10","11","12",..: 6 4 6 6 6 3 6 6 6 6 ...
## $ nihss : Factor w/ 5 levels "Minor stroke (1-4)",..: 2 2 2 2 1 1 1 2 1 1 ...
## $ mrs : Factor w/ 7 levels "Died (dischrage)",..: 3 7 6 3 4 7 7 3 7 4 ...
## $ iv_thrombolysis: Factor w/ 2 levels "No","Yes": 1 1 1 1 1 1 1 1 1 1 ...
## $ iv_thrombectomy: Factor w/ 1 level "No": 1 1 1 1 1 1 1 1 1 1 ...
## $ status_dis : Factor w/ 2 levels "Alive","Death": 1 1 1 1 1 1 1 1 1 1 ...
## $ dodis : Date, format: "0018-09-20" "0024-09-20" ...
## $ status_f.u : Factor w/ 2 levels "Alive","Died": 1 1 2 1 1 2 1 1 1 1 ...
## $ dodeath : Date, format: "0022-11-20" "0022-11-20" ...
## $ Sebab.Kematian : Factor w/ 44 levels "","ACUTE HEMORRHAGIC STROKE",..: 1 1 26 1 1 40 1 1 1 1 ...
## $ dur :Formal class 'Duration' [package "lubridate"] with 1 slot
## .. ..@ .Data: num 3.21e+08 5.27e+06 -4.29e+08 5.65e+08 5.34e+08 ...
## $ dur_days : num 3713 61 -4962 6544 6179 ...
## $ dur_month : num 122.07 2.01 -163.13 215.14 203.14 ...
Application and examples
Descriptive analysis
- tbl_summary()
| Characteristic | N = 2661 |
|---|---|
| age | 64 (55, 73) |
| sex | |
| Female | 103 (39%) |
| Male | 163 (61%) |
| dm | |
| No | 138 (52%) |
| Yes | 128 (48%) |
| hpt | |
| No | 48 (18%) |
| Yes | 218 (82%) |
| nihss | |
| No stroke symptoms (0) | 19 (7.1%) |
| Minor stroke (1-4) | 126 (47%) |
| Moderate stroke (5-15) | 85 (32%) |
| Moderate to severe stroke (16-20) | 19 (7.1%) |
| Severe stroke (21-42) | 17 (6.4%) |
| status_f.u | |
| Alive | 186 (70%) |
| Died | 80 (30%) |
| dur_days | 2,191 (-311, 5,075) |
| dur_month | 72 (-10, 167) |
| 1 Median (IQR); n (%) | |
- tbl_cross()
tbl_cross(stroke2_complete,
row = sex,
col = status_f.u,
percent = "row",
margin = "row") %>%
add_p(source_note = TRUE)| status_f.u | ||
|---|---|---|
| Alive | Died | |
| sex | ||
| Female | 62 (60%) | 41 (40%) |
| Male | 124 (76%) | 39 (24%) |
| Total | 186 (70%) | 80 (30%) |
| Pearson’s Chi-squared test, p=0.006 | ||
Binary logistic regression
Default output for regression analysis in R
default_stroke <- glm(status_f.u~age + sex, data = stroke2_complete, family = binomial(link = logit))
summary(default_stroke)##
## Call:
## glm(formula = status_f.u ~ age + sex, family = binomial(link = logit),
## data = stroke2_complete)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.93939 0.89754 -5.503 3.73e-08 ***
## age 0.06785 0.01276 5.316 1.06e-07 ***
## sexMale -0.64117 0.29368 -2.183 0.029 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 325.32 on 265 degrees of freedom
## Residual deviance: 283.23 on 263 degrees of freedom
## AIC: 289.23
##
## Number of Fisher Scoring iterations: 4
Logistic regression using gtsummary-univariate analysis
uvlog <- stroke2_complete %>% select(age, sex, dm, hpt, nihss, status_f.u) %>%
tbl_uvregression(method = glm,
y=status_f.u,
method.args = list(family = binomial),
exponentiate = TRUE)
uvlog| Characteristic | N | OR1 | 95% CI1 | p-value |
|---|---|---|---|---|
| age | 266 | 1.07 | 1.05, 1.10 | <0.001 |
| sex | 266 | |||
| Female | — | — | ||
| Male | 0.48 | 0.28, 0.81 | 0.006 | |
| dm | 266 | |||
| No | — | — | ||
| Yes | 1.85 | 1.09, 3.16 | 0.024 | |
| hpt | 266 | |||
| No | — | — | ||
| Yes | 1.56 | 0.77, 3.37 | 0.2 | |
| nihss | 266 | |||
| No stroke symptoms (0) | — | — | ||
| Minor stroke (1-4) | 0.32 | 0.11, 1.00 | 0.040 | |
| Moderate stroke (5-15) | 1.44 | 0.52, 4.45 | 0.5 | |
| Moderate to severe stroke (16-20) | 2.98 | 0.81, 11.9 | 0.11 | |
| Severe stroke (21-42) | 7.04 | 1.72, 34.6 | 0.010 | |
| 1 OR = Odds Ratio, CI = Confidence Interval | ||||
Logistic regression using gtsummary-multivariate analysis
mvlog <- glm(status_f.u~age+sex+dm+hpt+nihss,
stroke2_complete,
family = binomial)
mvlog %>% tbl_regression(
exponentiate=TRUE
)| Characteristic | OR1 | 95% CI1 | p-value |
|---|---|---|---|
| age | 1.07 | 1.04, 1.10 | <0.001 |
| sex | |||
| Female | — | — | |
| Male | 0.61 | 0.32, 1.16 | 0.13 |
| dm | |||
| No | — | — | |
| Yes | 1.73 | 0.89, 3.41 | 0.11 |
| hpt | |||
| No | — | — | |
| Yes | 0.74 | 0.29, 1.91 | 0.5 |
| nihss | |||
| No stroke symptoms (0) | — | — | |
| Minor stroke (1-4) | 0.44 | 0.13, 1.55 | 0.2 |
| Moderate stroke (5-15) | 1.92 | 0.61, 6.62 | 0.3 |
| Moderate to severe stroke (16-20) | 4.54 | 1.03, 22.0 | 0.051 |
| Severe stroke (21-42) | 6.28 | 1.30, 35.8 | 0.028 |
| 1 OR = Odds Ratio, CI = Confidence Interval | |||
Survival analysis
Survival rate using tbl_survfit
library(survival)
library(gtsummary)
fit1 <- survfit(Surv(dur_month, status_f.u)~ 1, stroke2_complete)
fit2 <- survfit(Surv(dur_month, status_f.u)~ sex, stroke2_complete)
life_table <- list(fit1, fit2) %>%
tbl_survfit(times= c(1, 12, 36)) %>%
modify_header(update = list(
stat_1 ~ "**1-month**",
stat_2 ~ "**1-year**",
stat_3 ~ "**3-years**"
)) %>%
add_n() ## tbl_survfit: Multi-state model detected. Showing probabilities into state 'Died'
## tbl_survfit: Multi-state model detected. Showing probabilities into state 'Died'
| Characteristic | N | 1-month | 1-year | 3-years |
|---|---|---|---|---|
| Overall | 266 | 0% (0%, 0%) | 11% (8.0%, 16%) | 14% (10%, 19%) |
| sex | 266 | |||
| Female | 0% (0%, 0%) | 16% (10%, 25%) | 18% (12%, 28%) | |
| Male | 0% (0%, 0%) | 8.4% (5.0%, 14%) | 11% (6.8%, 17%) |
Semi-parametric survival using gtsummary-univariate analysis
library(survival)
cox_uv <- tbl_uvregression(
stroke2_complete[c("dur_month", "status_f.u", "age", "sex", "dm",
"hpt", "nihss")],
method = coxph,
y = Surv(time = dur_month, event = status_f.u=='Died'),
exponentiate = TRUE
)
cox_uv| Characteristic | N | HR1 | 95% CI1 | p-value |
|---|---|---|---|---|
| age | 266 | 1.05 | 1.03, 1.06 | <0.001 |
| sex | 266 | |||
| Female | — | — | ||
| Male | 0.48 | 0.30, 0.74 | 0.001 | |
| dm | 266 | |||
| No | — | — | ||
| Yes | 1.53 | 0.97, 2.39 | 0.066 | |
| hpt | 266 | |||
| No | — | — | ||
| Yes | 1.32 | 0.70, 2.50 | 0.4 | |
| nihss | 266 | |||
| No stroke symptoms (0) | — | — | ||
| Minor stroke (1-4) | 0.38 | 0.15, 0.99 | 0.047 | |
| Moderate stroke (5-15) | 1.32 | 0.55, 3.16 | 0.5 | |
| Moderate to severe stroke (16-20) | 1.66 | 0.61, 4.50 | 0.3 | |
| Severe stroke (21-42) | 4.12 | 1.56, 10.9 | 0.004 | |
| 1 HR = Hazard Ratio, CI = Confidence Interval | ||||
Semi-parametric survival using gtsummary-multivariate analysis
cox_mv <- coxph(Surv(time = dur_month, event = status_f.u=='Died')~
age+sex+dm+hpt+nihss,
stroke2_complete) %>%
tbl_regression(exponentiate=TRUE)
cox_mv| Characteristic | HR1 | 95% CI1 | p-value |
|---|---|---|---|
| age | 1.04 | 1.02, 1.06 | <0.001 |
| sex | |||
| Female | — | — | |
| Male | 0.60 | 0.38, 0.96 | 0.033 |
| dm | |||
| No | — | — | |
| Yes | 1.88 | 1.14, 3.10 | 0.013 |
| hpt | |||
| No | — | — | |
| Yes | 0.60 | 0.30, 1.18 | 0.14 |
| nihss | |||
| No stroke symptoms (0) | — | — | |
| Minor stroke (1-4) | 0.68 | 0.25, 1.88 | 0.5 |
| Moderate stroke (5-15) | 2.20 | 0.88, 5.53 | 0.092 |
| Moderate to severe stroke (16-20) | 2.64 | 0.93, 7.48 | 0.067 |
| Severe stroke (21-42) | 7.40 | 2.56, 21.4 | <0.001 |
| 1 HR = Hazard Ratio, CI = Confidence Interval | |||
Customization
Customization
{gtsummary} + formulas
Data arrangement
stroke3 <- stroke2_complete %>% select(age, sex, dm, hpt, nihss)
desc <- tbl_summary(stroke3,
by = sex,
label = list( age ~ "Age",
sex ~ "Gender",
dm ~ "Diabetes Mellitus",
hpt ~ "Hypertention",
nihss ~ "NIHSS Score"),
digits = c(all_continuous() ~ 1,
all_categorical() ~ 0),
statistic = c(all_categorical() ~ "{n} ({p}%)",
all_continuous() ~ "{mean} ({sd})"))
desc | Characteristic | Female, N = 1031 | **Male **, N = 1631 |
|---|---|---|
| Age | 65.1 (14.9) | 62.9 (11.8) |
| Diabetes Mellitus | ||
| No | 51 (50%) | 87 (53%) |
| Yes | 52 (50%) | 76 (47%) |
| Hypertention | ||
| No | 13 (13%) | 35 (21%) |
| Yes | 90 (87%) | 128 (79%) |
| NIHSS Score | ||
| No stroke symptoms (0) | 6 (6%) | 13 (8%) |
| Minor stroke (1-4) | 45 (44%) | 81 (50%) |
| Moderate stroke (5-15) | 32 (31%) | 53 (33%) |
| Moderate to severe stroke (16-20) | 9 (9%) | 10 (6%) |
| Severe stroke (21-42) | 11 (11%) | 6 (4%) |
| 1 Mean (SD); n (%) | ||
Add extra information
## add_q: Adjusting p-values with
## `stats::p.adjust(x$table_body$p.value, method = "fdr")`
| Characteristic | N | Female, N = 1031 | **Male **, N = 1631 | p-value2 | q-value3 |
|---|---|---|---|---|---|
| Age | 266 | 65.1 (14.9) | 62.9 (11.8) | 0.12 | 0.2 |
| Diabetes Mellitus | 266 | 0.5 | 0.5 | ||
| No | 51 (50%) | 87 (53%) | |||
| Yes | 52 (50%) | 76 (47%) | |||
| Hypertention | 266 | 0.067 | 0.2 | ||
| No | 13 (13%) | 35 (21%) | |||
| Yes | 90 (87%) | 128 (79%) | |||
| NIHSS Score | 266 | 0.2 | 0.2 | ||
| No stroke symptoms (0) | 6 (6%) | 13 (8%) | |||
| Minor stroke (1-4) | 45 (44%) | 81 (50%) | |||
| Moderate stroke (5-15) | 32 (31%) | 53 (33%) | |||
| Moderate to severe stroke (16-20) | 9 (9%) | 10 (6%) | |||
| Severe stroke (21-42) | 11 (11%) | 6 (4%) | |||
| 1 Mean (SD); n (%) | |||||
| 2 Wilcoxon rank sum test; Pearson’s Chi-squared test | |||||
| 3 False discovery rate correction for multiple testing | |||||
Aesthethic
## add_q: Adjusting p-values with
## `stats::p.adjust(x$table_body$p.value, method = "fdr")`
| Characteristic | N | Female, N = 1031 | **Male **, N = 1631 | p-value2 | q-value3 |
|---|---|---|---|---|---|
| Age | 266 | 65.1 (14.9) | 62.9 (11.8) | 0.12 | 0.2 |
| Diabetes Mellitus | 266 | 0.5 | 0.5 | ||
| No | 51 (50%) | 87 (53%) | |||
| Yes | 52 (50%) | 76 (47%) | |||
| Hypertention | 266 | 0.067 | 0.2 | ||
| No | 13 (13%) | 35 (21%) | |||
| Yes | 90 (87%) | 128 (79%) | |||
| NIHSS Score | 266 | 0.2 | 0.2 | ||
| No stroke symptoms (0) | 6 (6%) | 13 (8%) | |||
| Minor stroke (1-4) | 45 (44%) | 81 (50%) | |||
| Moderate stroke (5-15) | 32 (31%) | 53 (33%) | |||
| Moderate to severe stroke (16-20) | 9 (9%) | 10 (6%) | |||
| Severe stroke (21-42) | 11 (11%) | 6 (4%) | |||
| 1 Mean (SD); n (%) | |||||
| 2 Wilcoxon rank sum test; Pearson’s Chi-squared test | |||||
| 3 False discovery rate correction for multiple testing | |||||
Merging and stacking
Stacking and merging
Merging
cox_uv <- tbl_uvregression(
stroke2_complete[c("dur_month", "status_f.u", "age", "sex", "dm",
"hpt", "nihss")],
method = coxph,
y = Surv(time = dur_month, event = status_f.u=='Died'),
exponentiate = TRUE,
label = list( age ~ "Age",
sex ~ "Gender",
dm ~ "Diabetes Mellitus",
hpt ~ "Hypertention",
nihss ~ "NIHSS Score"),
)cox_mv <- coxph(Surv(time = dur_month, event = status_f.u=='Died')~
age+sex+dm+hpt+nihss,
stroke2_complete) %>%
tbl_regression(exponentiate=TRUE,
label = list( age ~ "Age",
sex ~ "Gender",
dm ~ "Diabetes Mellitus",
hpt ~ "Hypertention",
nihss ~ "NIHSS Score"),)tbl_surv_merge <- tbl_merge(
list(cox_uv, cox_mv),
tab_spanner = c("**Univariable**","**Multivariable**")
)
tbl_surv_merge| Characteristic | Univariable | Multivariable | |||||
|---|---|---|---|---|---|---|---|
| N | HR1 | 95% CI1 | p-value | HR1 | 95% CI1 | p-value | |
| Age | 266 | 1.05 | 1.03, 1.06 | <0.001 | 1.04 | 1.02, 1.06 | <0.001 |
| Gender | 266 | ||||||
| Female | — | — | — | — | |||
| Male | 0.48 | 0.30, 0.74 | 0.001 | 0.60 | 0.38, 0.96 | 0.033 | |
| Diabetes Mellitus | 266 | ||||||
| No | — | — | — | — | |||
| Yes | 1.53 | 0.97, 2.39 | 0.066 | 1.88 | 1.14, 3.10 | 0.013 | |
| Hypertention | 266 | ||||||
| No | — | — | — | — | |||
| Yes | 1.32 | 0.70, 2.50 | 0.4 | 0.60 | 0.30, 1.18 | 0.14 | |
| NIHSS Score | 266 | ||||||
| No stroke symptoms (0) | — | — | — | — | |||
| Minor stroke (1-4) | 0.38 | 0.15, 0.99 | 0.047 | 0.68 | 0.25, 1.88 | 0.5 | |
| Moderate stroke (5-15) | 1.32 | 0.55, 3.16 | 0.5 | 2.20 | 0.88, 5.53 | 0.092 | |
| Moderate to severe stroke (16-20) | 1.66 | 0.61, 4.50 | 0.3 | 2.64 | 0.93, 7.48 | 0.067 | |
| Severe stroke (21-42) | 4.12 | 1.56, 10.9 | 0.004 | 7.40 | 2.56, 21.4 | <0.001 | |
| 1 HR = Hazard Ratio, CI = Confidence Interval | |||||||
Stacking
t1 <- glm(status_f.u~sex,
data = stroke2_complete,
family = binomial) %>%
tbl_regression(
exponentiate=TRUE,
label=list(sex ~"Gender (unadjusted)")
)
t2 <- glm(status_f.u~sex+age+dm+hpt,
data = stroke2_complete,
family = binomial) %>%
tbl_regression(
include="sex",
exponentiate=TRUE,
label=list(sex ~"Gender (adjusted)")
)
table_stack_ex1 <- tbl_stack(list(t1, t2))
table_stack_ex1| Characteristic | OR1 | 95% CI1 | p-value |
|---|---|---|---|
| Gender (unadjusted) | |||
| Female | — | — | |
| Male | 0.48 | 0.28, 0.81 | 0.006 |
| Gender (adjusted) | |||
| Female | — | — | |
| Male | 0.53 | 0.30, 0.95 | 0.032 |
| 1 OR = Odds Ratio, CI = Confidence Interval | |||
Print engine (gtsummary + R Markdown)
The gtsumamry package was developed to complement the gt package from RStudio. The gt package, however, does not support all types of output. As a result, we have provided support for printing gtsummary tables using a variety of engines.
For further information, see the gtsummary + R Markdown vignette.
gtsummary + R Markdown
##
## Attaching package: 'flextable'
## The following objects are masked from 'package:gtsummary':
##
## as_flextable, continuous_summary
tbl_surv_merge %>%
bold_labels() %>%
italicize_levels() %>%
as_flex_table() # if knit to microsoft word
| Univariable | Multivariable | |||||
|---|---|---|---|---|---|---|---|
Characteristic | N | HR1 | 95% CI1 | p-value | HR1 | 95% CI1 | p-value |
Age | 266 | 1.05 | 1.03, 1.06 | <0.001 | 1.04 | 1.02, 1.06 | <0.001 |
Gender | 266 | ||||||
Female | — | — | — | — | |||
Male | 0.48 | 0.30, 0.74 | 0.001 | 0.60 | 0.38, 0.96 | 0.033 | |
Diabetes Mellitus | 266 | ||||||
No | — | — | — | — | |||
Yes | 1.53 | 0.97, 2.39 | 0.066 | 1.88 | 1.14, 3.10 | 0.013 | |
Hypertention | 266 | ||||||
No | — | — | — | — | |||
Yes | 1.32 | 0.70, 2.50 | 0.4 | 0.60 | 0.30, 1.18 | 0.14 | |
NIHSS Score | 266 | ||||||
No stroke symptoms (0) | — | — | — | — | |||
Minor stroke (1-4) | 0.38 | 0.15, 0.99 | 0.047 | 0.68 | 0.25, 1.88 | 0.5 | |
Moderate stroke (5-15) | 1.32 | 0.55, 3.16 | 0.5 | 2.20 | 0.88, 5.53 | 0.092 | |
Moderate to severe stroke (16-20) | 1.66 | 0.61, 4.50 | 0.3 | 2.64 | 0.93, 7.48 | 0.067 | |
Severe stroke (21-42) | 4.12 | 1.56, 10.9 | 0.004 | 7.40 | 2.56, 21.4 | <0.001 | |
1HR = Hazard Ratio, CI = Confidence Interval | |||||||
Journal table format
## Setting theme `The Lancet`
lancet_theme <- tbl_surv_merge %>%
bold_labels() %>%
italicize_levels() %>%
as_gt() %>%
gt::tab_header("Journal Theme (Lancet)")
lancet_theme| Journal Theme (Lancet) | |||||||
| Characteristic | Univariable | Multivariable | |||||
|---|---|---|---|---|---|---|---|
| N | HR1 | 95% CI1 | p-value | HR1 | 95% CI1 | p-value | |
| Age | 266 | 1·05 | 1.03, 1.06 | <0·001 | 1·04 | 1.02, 1.06 | <0·001 |
| Gender | 266 | ||||||
| Female | — | — | — | — | |||
| Male | 0·48 | 0.30, 0.74 | 0·001 | 0·60 | 0.38, 0.96 | 0·033 | |
| Diabetes Mellitus | 266 | ||||||
| No | — | — | — | — | |||
| Yes | 1·53 | 0.97, 2.39 | 0·066 | 1·88 | 1.14, 3.10 | 0·013 | |
| Hypertention | 266 | ||||||
| No | — | — | — | — | |||
| Yes | 1·32 | 0.70, 2.50 | 0·4 | 0·60 | 0.30, 1.18 | 0·14 | |
| NIHSS Score | 266 | ||||||
| No stroke symptoms (0) | — | — | — | — | |||
| Minor stroke (1-4) | 0·38 | 0.15, 0.99 | 0·047 | 0·68 | 0.25, 1.88 | 0·5 | |
| Moderate stroke (5-15) | 1·32 | 0.55, 3.16 | 0·5 | 2·20 | 0.88, 5.53 | 0·092 | |
| Moderate to severe stroke (16-20) | 1·66 | 0.61, 4.50 | 0·3 | 2·64 | 0.93, 7.48 | 0·067 | |
| Severe stroke (21-42) | 4·12 | 1.56, 10.9 | 0·004 | 7·40 | 2.56, 21.4 | <0·001 | |
| 1 HR = Hazard Ratio, CI = Confidence Interval | |||||||
Report statistics in line
Tables are important but sometimes we still need to report a result in-line in a report. This is especially true when explaining the results of regression models for reader’s understanding.
With the inline_text() function, any data reported in the tables using {gtsummary} can be extracted and reported in-line in R markdown
For example:
We want to report hazard ratio for age from table cox_mv
| Characteristic | HR1 | 95% CI1 | p-value |
|---|---|---|---|
| Age | 1·04 | 1.02, 1.06 | <0·001 |
| Gender | |||
| Female | — | — | |
| Male | 0·60 | 0.38, 0.96 | 0·033 |
| Diabetes Mellitus | |||
| No | — | — | |
| Yes | 1·88 | 1.14, 3.10 | 0·013 |
| Hypertention | |||
| No | — | — | |
| Yes | 0·60 | 0.30, 1.18 | 0·14 |
| NIHSS Score | |||
| No stroke symptoms (0) | — | — | |
| Minor stroke (1-4) | 0·68 | 0.25, 1.88 | 0·5 |
| Moderate stroke (5-15) | 2·20 | 0.88, 5.53 | 0·092 |
| Moderate to severe stroke (16-20) | 2·64 | 0.93, 7.48 | 0·067 |
| Severe stroke (21-42) | 7·40 | 2.56, 21.4 | <0·001 |
| 1 HR = Hazard Ratio, CI = Confidence Interval | |||
For every 1 year increment in age, the hazard of dying from stroke increase by 4%, 1·04 (95% CI 1·02, 1·06; p<0·0001).
More Details
YouTube tutorial by Daniel Sjoberg and Emily Zabor https://www.youtube.com/watch?v=tANo9E1SYJE&t=628s https://www.youtube.com/watch?v=U2S6LbMN42I&t=25s
{gtsummary} website https://www.danieldsjoberg.com/gtsummary/
Conclusion
- Tables serves as a tool for communicating discrete data or direct comparison within a report
- The {gtsummary} package assists researchers in producing reproducible presentation ready tables